Fuzzy State Aggregation and Off-policy Reinforcement Learning for Stochastic Environments
نویسنده
چکیده
Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the environment it is operating in changes. This ability to learn in an unsupervised manner in a changing environment is applicable in complex domains through the use of function approximation of the domain’s policy. The function approximation presented here is that of fuzzy state aggregation. This article presents the use of fuzzy state aggregation with the current policy hill climbing methods of Win or Lose Fast (WoLF) and policy-dynamics based WoLF (PD-WoLF), exceeding the learning rate and performance of the combined fuzzy state aggregation and Q-learning reinforcement learning. Results of testing using the TileWorld domain demonstrate the policy hill climbing performs better than the existing Q-learning implementations.
منابع مشابه
Fuzzy State Aggregation and Policy Hill Climbing for Stochastic Environments
Received (received date) Revised (revised date) Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the operating environment changes. Additionally, by applying reinforcement learning to multiple cooperative software agents (a multi-agent system) not only allows each individual ag...
متن کاملAdaptive Critic Based Adaptation of A Fuzzy Policy Manager for A Logistic System
We show that a reinforcement learning method, adaptive critic based approximate dynamic programming, can be used to create fuzzy policy managers for adaptive control of a logistic system. Two different architectures are used for the policy manager, a feed forward neural network, and a fuzzy rule base. For both architectures, policy managers are trained that outperform LP and GA derived fixed po...
متن کاملPolicy Improvement for several Environments Extended Version
In this paper we state a generalized form of the policy improvement algorithm for reinforcement learning. This new algorithm can be used to ...nd stochastic policies that optimize single-agent behavior for several environments and reinforcement functions simultaneously. We ...rst introduce a geometric interpretation of policy improvement, de...ne a framework to apply one policy to several envir...
متن کاملPolicy Improvement for several Environments
In this paper we state a generalized form of the policy improvement algorithm for reinforcement learning. This new algorithm can be used to ...nd stochastic policies that optimize single-agent behavior for several environments and reinforcement functions simultaneously. We ...rst introduce a geometric interpretation of policy improvement, de...ne a framework to apply one policy to several envir...
متن کاملAction Dependent State Space Abstraction for Hierarchical Learning Systems
To operate effectively in complex environments learning agents have to selectively ignore irrelevant details by forming useful abstractions. In this paper we outline a formulation of abstraction for reinforcement learning approaches to stochastic decision problems by extending one of the recent minimization models, known as ǫ-reduction. The technique presented here extends ǫ-reduction to SMDPs ...
متن کامل